Skip to content

macro_boost: add has_sideeffects + counter-lane elision#2691

Merged
borisbat merged 2 commits into
masterfrom
bbatkin/has-sideeffects-counter-lane-elision
May 17, 2026
Merged

macro_boost: add has_sideeffects + counter-lane elision#2691
borisbat merged 2 commits into
masterfrom
bbatkin/has-sideeffects-counter-lane-elision

Conversation

@borisbat
Copy link
Copy Markdown
Collaborator

@borisbat borisbat commented May 16, 2026

Summary

Adds has_sideeffects(expr) : bool to daslib/macro_boost.das — a reusable conservative side-effect predicate. Returns true when an expression has or might have side effects; false only when provably pure. Wired into daslib/linq_fold.das for three counter-lane optimizations: discardable-bind elision, count→length shortcut, and each(<array>) peeling.

Follow-up to PR #2689 (Phase 2A loop planner).

Classification

SAFE — recurse operands:

  • ExprVar, all ExprConst*, ExprAddr, ExprTypeInfo/Decl/Tag
  • ExprField, ExprSafeField, ExprSwizzle
  • ExprRef2Value, ExprRef2Ptr, ExprPtr2Ref, ExprAddr
  • ExprIs, ExprIsVariant, ExprAsVariant, ExprSafeAsVariant
  • ExprCast, ExprNullCoalescing
  • ExprStringBuilder (string heap is no-op per compiler — recurse into .elements)
  • ExprKeyExists (pure container read)
  • ExprAt when .subexpr._type is NOT isGoodTableType (tables auto-insert on missing key)
  • ExprSafeAt always
  • ExprOp1/Op2/Op3 via pure-op-name allowlist on workhorse types (bypasses func==null artifacts from partial folding); falls back to func.flags.builtIn && !knownSideEffects && !unsafeOperation. / and % blacklisted (div-by-zero panic).
  • ExprCall when func is a C++ builtin with no side-effect flags

UNSAFE (return true): everything else, including allocations (ExprNew, ExprMakeArray, ExprMakeStruct, etc.), user-defined calls, lambda invocations.

Counter-lane integration

Three planner-level uses in linq_fold.das:

1. Discardable bind elision — the per-element var vfinal = projection is now emitted only when has_sideeffects(projection) is true. Pure projections (_._field * 2) produce a bare-loop counter directly in the macro output, no optimizer DCE needed. Impure projections keep the bind, preserving the per-element evaluation invariant from test_counter_lane_projection_side_effects.

2. count→length shortcut — when the counter lane has no where-filter AND every projection in the chain is pure AND the source has known length (array/table/string/range/fixed-array), the planner emits length(src) directly. The loop is elided entirely.

3. peel_each fixeach is a daslang generic; the resolved func.name is the mangled instance, so the original peel never fired for typed chains. Now also checks func.fromGeneric.name == "each". Gated to array-shaped arguments only (isGoodArrayType || isArray) so iterator-yielding sources like each(range(10)) keep their wrapper. Block-parameter typedecl is branched on source shape: iterator sources keep -const (rvalue, must be consumable); array sources keep the source's const& modifier so let arr <- chains type-check.

Tests

  • tests/macro_boost/test_has_sideeffects.das — 24 cases (17 safe + 5 unsafe + 2 conservative-unsafe). Tests use a _test_has_sideeffects probe call_macro (tests/macro_boost/_has_sideeffects_probe.das) that runs the predicate at macro time and emits ExprConstBool of the result.
  • tests/linq/test_linq_fold_ast.das — new AST-walk tests:
    • test_pure_projection_uses_length_shortcut — invoke body returns length(src) directly, no for-loop
    • test_bare_count_uses_length_shortcut — same for each(arr).count()
    • test_impure_projection_keeps_bind — for-body has bind + ++acc
    • test_peel_each_on_array_source / _on_bare_count — assert peel fires for arrays
    • test_peel_each_skips_non_array_sourceeach(range(...)) keeps its wrapper (gate prevents iterator-source peeling)
    • test_target_each_range_count_runs — behavioral check for iterator-source chains

Benchmark deltas (100K rows, INTERP, vs Phase 2A baseline)

Benchmark Shape m3f_old m3f (this PR) Delta
select_count select → count 15 0 length-shortcut elides loop entirely
chained_where where → where → count 17 6 peel + const-ref param
count_aggregate where → count 5 4 1ns from peel
to_array_filter where → select → to_array 11 10 1ns from peel

Test plan

  • mcp__daslang__lint clean on new/edited files
  • mcp__daslang__compile_check passes on new/edited files
  • mcp__daslang__run_test on tests/macro_boost/test_has_sideeffects.das — 24/24 pass
  • mcp__daslang__run_test on tests/linq/test_linq_fold_ast.das — 51/51 pass
  • mcp__daslang__run_test on tests/linq/test_linq_fold.das — 77/77 pass (incl. existing side-effects test)
  • Full tests/linq/* suite green (569 tests across 14 files)
  • Phase 2A benchmarks held or improved

Future work

  • Function-level [no_side_effects] annotation so daslang-defined pure helpers (length, key_exists, etc.) can opt in to the allowlist. Currently only C++ builtins with builtIn=true are caught.
  • intermediateBinds elision for chained pure selects (requires solving the ExprRef2Value substitution trap).
  • length-shortcut for count(distinct(x)) and similar — needs distinct/sorted/etc. handling in the planner.

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings May 16, 2026 22:23
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a reusable conservative has_sideeffects(expr) predicate to daslib/macro_boost.das and wires it into the linq_fold counter-lane planner so that the discardable var vfinal = projection bind is elided at macro time when the projection is provably pure. Comes with a probe-based test module exercising 24 expression shapes plus two AST-level integration tests asserting the elision (and its preservation for impure projections). Follow-up to PR #2689 (Phase 2A loop planner).

Changes:

  • Introduces has_sideeffects, func_has_sideeffects, is_safe_op1, is_safe_op2 ([macro_function]) in daslib/macro_boost.das.
  • Gates the counter-lane vfinal bind in daslib/linq_fold.das on has_sideeffects(projection).
  • Adds tests/macro_boost/test_has_sideeffects.das + _has_sideeffects_probe.das and two new AST tests in tests/linq/test_linq_fold_ast.das; updates the select_count row in benchmarks/sql/LINQ.md.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
daslib/macro_boost.das Adds the conservative has_sideeffects predicate and helpers.
daslib/linq_fold.das Uses has_sideeffects to skip emitting the discardable vfinal bind for pure projections.
tests/macro_boost/test_has_sideeffects.das 24 classification test cases for the new predicate.
tests/macro_boost/_has_sideeffects_probe.das Helper module exposing a _test_has_sideeffects call_macro that emits ExprConstBool.
tests/linq/test_linq_fold_ast.das Adds an impure-projection target and two AST tests asserting counter-lane bind elision behavior.
benchmarks/sql/LINQ.md Updates the select_count note to reference the new macro-time elision path.
Comments suppressed due to low confidence (4)

daslib/macro_boost.das:244

  • func_has_sideeffects treats any builtin function whose flags don't set knownSideEffects or unsafeOperation as side-effect-free. Combined with the operator paths above, this can misclassify mutation operators that aren't on the safe-op allowlist. In particular:
  • ExprOp1 with op ++, --, +++, --- falls through !is_safe_op1(op) and then through func_has_sideeffects(func). If the builtin overload for these increment/decrement operators on workhorse types is not flagged knownSideEffects/unsafeOperation, has_sideeffects returns false and reports x++ as pure.
  • ExprOp2 compound assignment operators (+=, -=, *=, /=, %=, &=, |=, ^=, <<=, >>=, ||=, &&=, ^^=, <<<=, >>>=) are parsed as ExprOp2 (see InferTypes::isAssignmentOperator in src/ast/ast_infer_type_op.cpp:116). None are in is_safe_op2, but the same func_has_sideeffects fallback applies — if the resolved builtin doesn't carry one of the two flags, the mutation is classified as pure.

Since has_sideeffects is a public, general-purpose predicate (not just used for the linq counter lane), and the contract is "false is a promise of purity", these operators must be on the unsafe side unconditionally regardless of how the resolved builtin happens to be flagged. Consider blacklisting any ExprOp1 op ending in ++/-- and any ExprOp2 op ending in = (other than the already-handled relational ops) up front, before consulting func.flags. The test suite in tests/macro_boost/test_has_sideeffects.das doesn't cover these mutation operators, so a regression here would go undetected.

    if (expr is ExprOp1) {
        let e1 = expr as ExprOp1
        if (!is_safe_op1(e1.op) && func_has_sideeffects(e1.func)) return true
        return has_sideeffects(e1.subexpr)
    }
    if (expr is ExprOp2) {
        let e2 = expr as ExprOp2
        // Unsafe: division/modulo (div-by-zero panic, design decision); or op not in the
        // safe allowlist AND the resolved func indicates side effects. The allowlist also
        // bypasses func==null artifacts from partial folding.
        if (e2.op == "/" || e2.op == "%"
                || (!is_safe_op2(e2.op) && func_has_sideeffects(e2.func))) return true
        return has_sideeffects(e2.left) || has_sideeffects(e2.right)
    }

daslib/macro_boost.das:268

  • func_has_sideeffects returns true (i.e., "has side effects") when f == null OR !f.flags.builtIn OR f.flags.knownSideEffects OR f.flags.unsafeOperation. The parenthesization is correct, but the function name and the way callers use it (if (... && func_has_sideeffects(e1.func)) return true) make the polarity hard to follow — particularly the !f.flags.builtIn term, which says "any non-builtin function is unsafe". A short docstring clarification or renaming to e.g. func_may_have_sideeffects (matching the conservative semantics of has_sideeffects itself) would make this less error-prone for future maintainers. The fact that f == null is classified as "unsafe" here is also worth noting explicitly, since the surrounding comments at lines 222–225 describe the null-func case as something the allowlist "bypasses" — a reader could reasonably expect func_has_sideeffects(null) to be a safer default.
[macro_function]
def private func_has_sideeffects(f : Function?) : bool {
    //! True when calling `f` may have side effects. Allowlists builtins
    //! (`flags.builtIn`) without `knownSideEffects` or `unsafeOperation`.
    return (f == null || !f.flags.builtIn
        || f.flags.knownSideEffects || f.flags.unsafeOperation)
}

daslib/macro_boost.das:185

  • ExprCast recurses only into subexpr, which is correct, but casts that invoke a user-defined conversion function (cast through [generic]/overloaded conversion) can still carry side effects. In daslang, ExprCast is largely a typing/representation change, but if a future cast path lowers into a call expression, that would surface as an ExprCall rather than staying inside ExprCast. This is informational — no change required if the current AST guarantees no embedded calls in ExprCast — but worth confirming.
    if (expr is ExprCast) return has_sideeffects((expr as ExprCast).subexpr)

tests/macro_boost/test_has_sideeffects.das:160

  • The test file covers 24 cases as advertised, but doesn't exercise the mutation-operator paths (++, --, +=, -=, etc.) that flow through the func_has_sideeffects fallback in ExprOp1/ExprOp2. Given that the design of has_sideeffects explicitly comments out compound-assignment ops from is_safe_op2 (line 280: "compound-assignment ops are not in the allowlist (mutation)"), adding a test like t |> equal(_test_has_sideeffects((_x += 1, _x)), true) (or whatever construction the comma/sequence operator supports) would lock in the contract and catch regressions if the builtin flags change. Same for unary _x++. This is the most consequential gap in coverage for a "conservative side-effect predicate".
// ── UNSAFE cases — has_sideeffects must return true ──────────────────────

[test]
def test_user_call_unsafe(t : T?) {
    var _x = 5
    t |> equal(_test_has_sideeffects(side_effect_fn(_x)), true)
}

[test]
def test_table_insert_subscript(t : T?) {
    var _tab : table<string; int>
    // _tab[k] auto-inserts a default value if k is missing — side effect.
    t |> equal(_test_has_sideeffects(_tab["k"]), true)
}

[test]
def test_division_unsafe(t : T?) {
    var _x = 10
    var _y = 2
    // `/` can panic on div-by-zero — kept on the unsafe side by explicit blacklist.
    t |> equal(_test_has_sideeffects(_x / _y), true)
}

[test]
def test_modulo_unsafe(t : T?) {
    var _x = 10
    var _y = 3
    t |> equal(_test_has_sideeffects(_x % _y), true)
}

[test]
def test_array_literal_alloc(t : T?) {
    t |> equal(_test_has_sideeffects([1, 2, 3]), true)
}

[test]
def test_struct_construct_alloc(t : T?) {
    t |> equal(_test_has_sideeffects(Foo(a = 1, b = 2)), true)
}

[test]
def test_string_builder_unsafe_part(t : T?) {
    var _x = 5
    // The string interpolation itself is safe, but a side-effecting operand propagates.
    t |> equal(_test_has_sideeffects("hello {side_effect_fn(_x)}"), true)
}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread daslib/macro_boost.das
Comment thread tests/macro_boost/test_has_sideeffects.das
Adds a reusable conservative `has_sideeffects(expr) : bool` predicate to
daslib/macro_boost. Returns true if an expression has — or might have —
side effects; false ONLY when provably pure. Intended for macro-time
elision of discardable evaluations.

Classification:
- Safe leaves: ExprVar, all ExprConst*, ExprAddr, ExprTypeInfo/Decl/Tag.
- Safe via recursion: ExprField/SafeField/Swizzle, ExprRef2Value/Ptr,
  ExprPtr2Ref, ExprAddr, ExprIs/IsVariant/AsVariant/SafeAsVariant,
  ExprCast, ExprNullCoalescing, ExprStringBuilder (string heap is
  no-op per compiler), ExprKeyExists (pure container read).
- ExprAt: safe when subexpr type is NOT isGoodTableType (tables auto-
  insert on missing key); ExprSafeAt always safe.
- ExprOp1/Op2/Op3: op-name allowlist for pure ops on workhorse types
  (bypasses func==null artifacts from partial folding); falls back to
  the function-flag check. `/` and `%` blacklisted (div-by-zero panic).
- ExprCall: allowlist `func.flags.builtIn && !knownSideEffects &&
  !unsafeOperation`, recurse args.
- Everything else: conservative true.

Counter-lane integration in daslib/linq_fold.das:

1. Discardable `var vfinal = projection` bind is now emitted only when
   `has_sideeffects(projection)` returns true. Pure projections like
   `_._field * 2` produce a bare-loop counter at macro time, no
   optimizer DCE required.

2. count→length shortcut: when the counter lane has no where-filter
   AND every projection in the chain is pure AND the source has a known
   length (array/table/string/range/fixed-array), the planner emits
   `length(src)` directly — the loop is elided entirely. select_count
   benchmark drops from 2 ns/op to 0 ns/op.

3. peel_each fix: `each` is a daslang generic, so the resolved
   `func.name` on a typed call is the mangled instance. The original
   peel only matched `func.name == "each"` and never fired for typed
   chains. Now also checks `func.fromGeneric.name == "each"`. Gated to
   array-shaped arguments (isGoodArrayType || isArray) so iterator-
   yielding sources like `each(range(10))` keep their wrapper.

4. Block-parameter typedecl branched on source shape: iterator sources
   keep `-const` (rvalue, must be consumable); array sources keep the
   source's `const&` modifier (peeled `let arr <-` is const-ref).

Tests:
- tests/macro_boost/test_has_sideeffects.das — 24 cases (17 safe + 5
  unsafe + 2 conservative-unsafe) wired via a `_test_has_sideeffects`
  probe call_macro that emits ExprConstBool at macro time.
- tests/linq/test_linq_fold_ast.das — 5 new tests:
  * test_pure_projection_uses_length_shortcut — invoke body returns
    `length(src)` directly, no for loop.
  * test_bare_count_uses_length_shortcut — same for `each(arr).count()`.
  * test_impure_projection_keeps_bind — for-body has bind + ++acc.
  * test_peel_each_on_array_source / _on_bare_count — assert peel fires.
  * test_peel_each_skips_non_array_source — `each(range(...))` keeps
    its wrapper (gate prevents iterator-source peeling).
  * test_target_each_range_count_runs — behavioral check for
    iterator-source chains.

Benchmarks (100K rows, INTERP, vs Phase 2A baseline):
- select_count: 2 → 0 ns/op (length shortcut elides loop entirely)
- chained_where: 8 → 6 ns/op (peel + const-ref param)
- count_aggregate: 5 → 4 ns/op (1ns from peel)
- to_array_filter: 11 → 10 ns/op (1ns from peel)

569/569 linq tests + 51/51 fold-AST + 24/24 has_sideeffects pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@borisbat borisbat force-pushed the bbatkin/has-sideeffects-counter-lane-elision branch from 19c7869 to c6a9d79 Compare May 16, 2026 23:06
@borisbat borisbat requested a review from Copilot May 16, 2026 23:09
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

Comment thread daslib/macro_boost.das Outdated
Comment thread daslib/macro_boost.das
Comment thread daslib/linq_fold.das
Comment thread daslib/linq_fold.das
Comment thread tests/macro_boost/test_has_sideeffects.das
Comment thread daslib/macro_boost.das
…owlist

Two correctness fixes from Copilot review on PR #2691:

1. **Mutation operators bypass.** `++`, `--`, and compound-assignment ops
   (`+=`, `-=`, `*=`, `/=`, `%=`, `&=`, `|=`, `^=`, `<<=`, `>>=`,
   `&&=`, `||=`, `^^=`) fall through the `is_safe_op{1,2}` allowlist
   check, but the fallback through `func_has_sideeffects` only catches
   them if the resolved C++ builtin happens to carry the right flag. If
   the builtin missed the flag (or there is no resolved builtin), `x++`
   classifies as pure. Add `is_mutation_op1` / `is_mutation_op2`
   blacklists invoked up front, before any flag check. Note the AST
   op-string convention: postfix `++` / `--` are `"+++"` / `"---"`.

2. **User operator overloads bypass.** When `e2.op` is in the safe
   allowlist (`+`, `-`, `*`, `==`, etc.), the old code skipped the
   `func_has_sideeffects(e2.func)` check entirely. A user-defined
   `def operator +(...)` on a custom type would then classify as pure
   regardless of side effects. Restructure: `func != null` → trust the
   func flags (non-builtin defaults to unsafe via `func_has_sideeffects`);
   `func == null` → fall back to op-name allowlist for partial-folding
   artifacts.

Tests:
- `test_postfix_increment_unsafe`, `test_postfix_decrement_unsafe`
- `test_user_op_overload_unsafe` (defines `operator +` on a private
  struct with a global-counter side effect)

CI fix: register `tests/macro_boost/` in `tests/aot/CMakeLists.txt`
(missed when the test directory was created in the parent commit).
Mirrors the `tests/linq/` pattern: a test-files glob + a module-files
list for the `_has_sideeffects_probe.das` helper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.

@borisbat borisbat merged commit b99d5bf into master May 17, 2026
32 checks passed
pull Bot pushed a commit to forksnd/daScript that referenced this pull request May 17, 2026
…ssion

Cards added in the course of the linq_fold splice rewrite + PR GaijinEntertainment#2691
(has_sideeffects + counter-lane elision). Topics:

linq_fold / macro-emission patterns:
- daslang-generic-instance-detect-via-fromgeneric — func.fromGeneric is
  the canonical "which generic was this instantiated from?" link;
  func.name on typed instances is mangled.
- daslib-macro-boost-has-sideeffects-predicate — new public predicate,
  full classification table, known limitations, test plumbing.
- qmacro-invoke-source-bind-typedecl-modifier-iter-vs-array — typedecl
  block-param const/ref handling differs between iterator and array
  sources; the two diagnostic error messages tell you which branch you
  picked wrong.
- qmacro-gensym-per-callsite-via-lineinfo — backtick-prefixed names +
  line+column suffix, force_at / force_generated / can_shadow.
- my-fold-macro-emits-a-loop-with-for-it-in-source-... (UPDATED) —
  peel_each pattern corrected for generic-instance detection + positive
  array gate + block-param typedecl handling.

LINQ semantics:
- are-there-parity-tests-in-tests-linq-that-compare-fold-output-to-...
- which-typedecl-predicates-identify-types-where-length-expr-is-...
- why-does-each-arr-fail-with-unsafe-when-not-source-of-for-loop-...
- what-s-the-right-sqlite-linq-chain-form-for-aggregates-sum-min-max-...
- my-macro-substitutes-it-for-a-projection-expression-via-template-...
- when-a-call-macro-needs-to-pick-copy-vs-move-init-for-a-projection-...
- where-does-nolint-rule-go-when-a-lint-warning-is-emitted-from-inside-...

Tooling / ops:
- how-do-i-run-dastest-in-benchmark-only-mode-and-what-s-the-command-...
- cpp-profiler-macos-samply-instruments.md
- what-s-the-end-to-end-checklist-for-adding-a-new-daslib-das-module-...
- how-do-i-call-a-dasimgui-or-any-managed-c-method-on-a-struct-field-...

Updated:
- why-does-my-dastest-integration-test-hang-at-readiness-gate-failed-...
  — original card pointed at a require-order red herring; real cause
  was ref_time_ticks() returning ns on POSIX while wait_until_ready's
  deadline math assumed μs. Fix landed in PR GaijinEntertainment#2685.

No code changes — docs only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants